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Abstract 


Data centers are attached to the Internet or a backbone network by gateway routers. One data 
center typically has more than one gateway for commercial, load-balancing, and resiliency 
reasons. Other sites, such as access networks, also need to be connected across backbone 
networks through gateways. 


This document defines a mechanism using the BGP Tunnel Encapsulation attribute to allow data 
center gateway routers to advertise routes to the prefixes reachable in the site, including 
advertising them on behalf of other gateways at the same site. This allows segment routing to be 
used to identify multiple paths across the Internet or backbone network between different 
gateways. The paths can be selected for load-balancing, resilience, and quality purposes. 


Status of This Memo 


This is an Internet Standards Track document. 


This document is a product of the Internet Engineering Task Force (IETF). It represents the 
consensus of the IETF community. It has received public review and has been approved for 
publication by the Internet Engineering Steering Group (IESG). Further information on Internet 
Standards is available in Section 2 of RFC 7841. 


Information about the current status of this document, any errata, and how to provide feedback 


on it may be obtained at https://www.rfc-editor.org/info/rfc9125. 
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This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF 
Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this 
document. Please review these documents carefully, as they describe your rights and restrictions 
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1. Introduction 


Data centers (DCs) are critical components of the infrastructure used by network operators to 
provide services to their customers. DCs (sites) are interconnected by a backbone network, which 
consists of any number of private networks and/or the Internet. DCs are attached to the 
backbone network by routers that are gateways (GWs). One DC typically has more than one GW 
for various reasons including commercial preferences, load balancing, or resiliency against 
connection or device failure. 
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Segment Routing (SR) ([RFC8402]) is a protocol mechanism that can be used within a DC as well 
as for steering traffic that flows between two DC sites. In order for a source site (also known as 
an ingress site) that uses SR to load-balance the flows it sends to a destination site (also known as 
an egress site), it needs to know the complete set of entry nodes (i.e., GWs) for that egress DC 
from the backbone network connecting the two DCs. Note that it is assumed that the connected 
set of DC sites and the border nodes in the backbone network on the paths that connect the DC 
sites are part of the same SR BGP - Link State (LS) instance (see [RFC7752] and [RFC9086]) so that 
traffic engineering using SR may be used for these flows. 


Other sites, such as access networks, also need to be connected across backbone networks 
through gateways. For illustrative purposes, consider the ingress and egress sites shown in 
Figure 1 as separate Autonomous Systems (ASes) (noting that the sites could be implemented as 
part of the ASes to which they are attached, or as separate ASes). The various ASes that provide 
connectivity between the ingress and egress sites could each be constructed differently and use 
different technologies such as IP; MPLS using global table routing information from BGP; MPLS 
IP VPN; SR-MPLS IP VPN; or SRv6 IP VPN. That is, the ingress and egress sites can be connected by 
tunnels across a variety of technologies. This document describes how SR Segment Identifiers 
(SIDs) are used to identify the paths between the ingress and egress sites. 


The solution described in this document is agnostic as to whether the transit ASes do or do not 
have SR capabilities. The solution uses SR to stitch together path segments between GWs and 
through the Autonomous System Border Routers (ASBRs). Thus, there is a requirement that the 
GWs and ASBRs are SR capable. The solution supports the SR path being extended into the 
ingress and egress sites if they are SR capable. 


The solution defined in this document can be seen in the broader context of site interconnection 
in [SR-INTERCONNECT]. That document shows how other existing protocol elements may be 
combined with the solution defined in this document to provide a full system, but it is not a 
necessary reference for understanding this document. 


Suppose that there are two gateways, GW1 and GW2 as shown in Figure 1, for a given egress site 
and that they each advertise a route to prefix X, which is located within the egress site with each 
setting itself as next hop. One might think that the GWs for X could be inferred from the routes' 
next-hop fields, but typically it is not the case that both routes get distributed across the 
backbone: rather only the best route, as selected by BGP, is distributed. This precludes load- 
balancing flows across both GWs. 
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Figure 1: Example Site Interconnection 


The obvious solution to this problem is to use the BGP feature that allows the advertisement of 
multiple paths in BGP (known as Add-Paths) ([RFC7911]) to ensure that all routes to X get 
advertised by BGP. However, even if this is done, the identity of the GWs will be lost as soon as 
the routes get distributed through an ASBR that will set itself to be the next hop. And if there are 
multiple ASes in the backbone, not only will the next hop change several times, but the Add-Paths 
technique will experience scaling issues. This all means that the Add-Paths approach is 
effectively limited to sites connected over a single AS. 


This document defines a solution that overcomes this limitation and works equally well with a 
backbone constructed from one or more ASes using the Tunnel Encapsulation attribute 
([RFC9012]) as follows: 


When a GWtoa given site advertises a route to a prefix X within that site, it will include a 
Tunnel Encapsulation attribute that contains the union ofthe Tunnel Encapsulation 
attributes advertised by each of the GWs to that site, including itself. 


In other words, each route advertised by a GW identifies all of the GWs to the same site (see 
Section 3 for a discussion of how GWs discover each other), i.e., the Tunnel Encapsulation 
attribute advertised by each GW contains multiple Tunnel TLVs, one or more from each active 
GW, and each Tunnel TLV will contain a Tunnel Egress Endpoint sub-TLV that identifies the GW 
for that Tunnel TLV. Therefore, even if only one of the routes is distributed to other ASes, it will 
not matter how many times the next hop changes, as the Tunnel Encapsulation attribute will 
remain unchanged. 
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To put this in the context of Figure 1, GW1 and GW2 discover each other as gateways for the 
egress site. Both GW1 and GW2 advertise themselves as having routes to prefix X. Furthermore, 
GW1 includes a Tunnel Encapsulation attribute, which is the union of its Tunnel Encapsulation 
attribute and GW2's Tunnel Encapsulation attribute. Similarly, GW2 includes a Tunnel 
Encapsulation attribute, which is the union of its Tunnel Encapsulation attribute and GW1's 
Tunnel Encapsulation attribute. The gateway in the ingress site can now see all possible paths to 
X in the egress site regardless of which route is propagated to it, and it can choose one or balance 
traffic flows as it sees fit. 


2. Requirements Language 


The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD 
NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to 
be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in 
all capitals, as shown here. 


3. Site Gateway Auto-Discovery 


To allow a given site's GWs to auto-discover each other and to coordinate their operations, the 
following procedures are implemented: 


e A route target ([RFC4360]) MUST be attached to each GW's auto-discovery route (defined 
below), and its value MUST be set to a value that indicates the site identifier. The rules for 
constructing a route target are detailed in [RFC4360]. It is RECOMMENDED that a Type x00 or 
x02 route target be used. 

e Site identifiers are set through configuration. The site identifiers MUST be the same across all 
GWs to the site (i.e., the same identifier is used by all GWs to the same site) and MUST be 
unique across all sites that are connected (i.e., across all GWs to all sites that are 
interconnected). 

e Each GW MUST construct an import filtering rule to import any route that carries a route 
target with the same site identifier that the GW itself uses. This means that only these GWs 
will import those routes, and that all GWs to the same site will import each other's routes 
and will learn (auto-discover) the current set of active GWs for the site. 


The auto-discovery route that each GW advertises consists of the following: 


e [Pv4 or IPv6 Network Layer Reachability Information (NLRI) ([RFC4760]) containing one of 
the GW's loopback addresses (that is, with an AFI/SAFI pair that is one of the following: IPv4/ 
NLRI used for unicast forwarding (1/1); IPv6/NLRI used for unicast forwarding (2/1); IPv4/ 
NLRI with MPLS Labels (1/4); or IPv6/NLRI with MPLS Labels (2/4)). 

e A Tunnel Encapsulation attribute ([RFC9012]) containing the GW's encapsulation 
information encoded in one or more Tunnel TLVs. 
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To avoid the side effect of applying the Tunnel Encapsulation attribute to any packet that is 
addressed to the GW itself, the address advertised for auto-discovery MUST be a different 
loopback address than is advertised for packets directed to the gateway itself. 


As described in Section 1, each GW will include a Tunnel Encapsulation attribute with the GW 
encapsulation information for each of the site's active GWs (including itself) in every route 
advertised externally to that site. As the current set of active GWs changes (due to the addition of 
a new GW or the failure/removal of an existing GW), each externally advertised route will be re- 
advertised with a new Tunnel Encapsulation attribute, which reflects the current set of active 
GWs. 


If a gateway becomes disconnected from the backbone network, or if the site operator decides to 
terminate the gateway's activity, it MUST withdraw the advertisements described above. This 
means that remote gateways at other sites will stop seeing advertisements from or about this 
gateway. Note that if the routing within a site is broken (for example, such that there is a route 
from one GW to another but not in the reverse direction), then it is possible that incoming traffic 
will be routed to the wrong GW to reach the destination prefix; in this degraded network 
situation, traffic may be dropped. 


Note that if a GW is (mis)configured with a different site identifier from the other GWs to the 
same site, then it will not be auto-discovered by the other GWs (and will not auto-discover the 
other GWs). This would result in a GW for another site receiving only the Tunnel Encapsulation 
attribute included in the BGP best route, i.e., the Tunnel Encapsulation attribute of the 
(mis)configured GW or that of the other GWs. 


4. Relationship to BGP - Link State and Egress Peer 
Engineering 


When a remote GW receives a route to a prefix X, it uses the Tunnel Egress Endpoint sub-TLVs in 
the containing Tunnel Encapsulation attribute to identify the GWs through which X can be 
reached. It uses this information to compute SR Traffic Engineering (SR TE) paths across the 
backbone network looking at the information advertised to it in SR BGP - Link State (BGP-LS) 
([RFC9085]) and correlated using the site identity. SR Egress Peer Engineering (EPE) ([RFC9086]) 
can be used to supplement the information advertised in BGP-LS. 


5. Advertising a Site Route Externally 


When a packet destined for prefix X is sent on an SR TE path to a GW for the site containing X 
(that is, the packet is sent in the ingress site on an SR TE path that describes the whole path 
including those parts that are within the egress site), it needs to carry the receiving GW's SID for 
X such that this SID becomes the next SID that is due to be processed before the GW completes its 
processing of the packet. To achieve this, each Tunnel TLV in the Tunnel Encapsulation attribute 
contains a Prefix-SID sub-TLV ([RFC9012]) for X. 
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As defined in [RFC9012], the Prefix-SID sub-TLV is only for IPv4/IPV6 Labeled Unicast routes, so 
the solution described in this document only applies to routes of those types. If the use of the 
Prefix-SID sub-TLV for routes of other types is defined in the future, further documents will be 
needed to describe their use for site interconnection consistent with this document. 


Alternatively, if MPLS SR is in use and if the GWs for a given egress site are configured to allow 
GWs at remote ingress sites to perform SR TE through that egress site for a prefix X, then each 
GW to the egress site computes an SR TE path through the egress site to X and places each in an 
MPLS Label Stack sub-TLV ([RFC9012]) in the SR Tunnel TLV for that GW. 


Please refer to Section 7 of [SR-INTERCONNECT] for worked examples of how the SID stack is 
constructed in this case and how the advertisements would work. 


6. Encapsulation 


If a site is configured to allow remote GWs to send packets to the site in the site's native 
encapsulation, then each GW to the site will also include multiple instances of a Tunnel TLV for 
that native encapsulation in externally advertised routes: one for each GW. Each Tunnel TLV 
contains a Tunnel Egress Endpoint sub-TLV with the address of the GW that the Tunnel TLV 
identifies. A remote GW may then encapsulate a packet according to the rules defined via the 
sub-TLVs included in each of the Tunnel TLVs. 


7. IANA Considerations 


IANA maintains the "BGP Tunnel Encapsulation Attribute Tunnel Types" registry in the "Border 
Gateway Protocol (BGP) Tunnel Encapsulation" registry. 


IANA had previously assigned the value 17 from this subregistry for "SR Tunnel", referencing this 
document as an Internet-Draft. At that time, the assignment policy for this range of the registry 
was "First Come First Served" [RFC8126]. 


IANA has marked that assignment as deprecated. IANA may reclaim that codepoint at sucha 
time that the registry is depleted. 


8. Security Considerations 


From a protocol point of view, the mechanisms described in this document can leverage the 
security mechanisms already defined for BGP. Further discussion of security considerations for 
BGP may be found in the BGP specification itself ([RFC4271]) and in the security analysis for BGP 
([RFC4272]). The original discussion of the use of the TCP MD5 signature option to protect BGP 
sessions is found in [RFC5925], while [RFC6952] includes an analysis of BGP keying and 
authentication issues. 


The mechanisms described in this document involve sharing routing or reachability information 
between sites, which may mean disclosing information that is normally contained within a site. 
So it needs to be understood that normal security paradigms based on the boundaries of sites are 
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weakened and interception of BGP messages may result in information being disclosed to third 
parties. Discussion of these issues with respect to VPNs can be found in [RFC4364], while 
[RFC7926] describes many of the issues associated with the exchange of topology or TE 
information between sites. 


Particular exposures resulting from this work include: 


e Gateways to a site will know about all other gateways to the same site. This feature applies 
within a site, so it is not a substantial exposure, but it does mean that if the BGP exchanges 
within a site can be snooped or if a gateway can be subverted, then an attacker may learn 
the full set of gateways to a site. This would facilitate more effective attacks on that site. 


° The existence of multiple gateways to a site becomes more visible across the backbone and 
even into remote sites. This means that an attacker is able to prepare a more comprehensive 
attack than exists when only the locally attached backbone network (e.g., the AS that hosts 
the site) can see all of the gateways to a site. For example, a Denial-of-Service attack on a 
single GW is mitigated by the existence of other GWs, but if the attacker knows about all the 
gateways, then the whole set can be attacked at once. 


«A node in a site that does not have external BGP peering (i.e., is not really a site gateway and 
cannot speak BGP into the backbone network) may be able to get itself advertised as a 
gateway by letting other genuine gateways discover it (by speaking BGP to them within the 
site), so it may get those genuine gateways to advertise it as a gateway into the backbone 
network. This would allow the malicious node to attract traffic without having to have 
secure BGP peerings with out-of-site nodes. 


e An external party intercepting BGP messages anywhere between sites may learn 
information about the functioning of the sites and the locations of endpoints. While this is 
not necessarily a significant security or privacy risk, it is possible that the disclosure of this 
information could be used by an attacker. 


e If it is possible to modify a BGP message within the backbone, it may be possible to spoof the 


existence of a gateway. This could cause traffic to be attracted to a specific node and might 
result in traffic not being delivered. 


All of the issues in the list above could cause disruption to site interconnection, but they are not 
new protocol vulnerabilities so much as new exposures of information that SHOULD be protected 
against using existing protocol mechanisms such as securing the TCP sessions over which the 
BGP messages flow. Furthermore, it is a general observation that if these attacks are possible, 
then it is highly likely that far more significant attacks can be made on the routing system. It 
should be noted that BGP peerings are not discovered but always arise from explicit 
configuration. 


Given that the gateways and ASBRs are connected by tunnels that may run across parts of the 
network that are not trusted, data center operators using the approach set out in this network 
MUST consider using gateway-to-gateway encryption to protect the data center traffic. 
Additionally, due consideration MUST be given to encrypting end-to-end traffic as it would be for 
any traffic that uses a public or untrusted network for transport. 
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9. Manageability Considerations 


The principal configuration item added by this solution is the allocation of a site identifier. The 
same identifier MUST be assigned to every GW to the same site, and each site MUST have a 
different identifier. This reguires coordination, probably through a central management agent. 


Tt should be noted that BGP peerings are not discovered but always arise from explicit 
configuration. This is no different from any other BGP operation. 


The site identifiers that are configured and carried in route targets (see Section 3) are an 
important feature to ensure that all ofthe gateways to a site discover each other. Therefore, it is 
important that this value is not misconfigured since that would result in the gateways not 
discovering each other and not advertising each other. 


9.1. Relationship to Route Target Constraint 


In order to limit the VPN routing information that is maintained at a given route reflector, 
[RFC4364] suggests that route reflectors use "Cooperative Route Filtering", which was renamed 
"Outbound Route Filtering" and defined in [RFC5291]. [RFC4684] defines an extension to that 
mechanism to include support for multiple autonomous systems and asymmetric VPN topologies 
such as hub-and-spoke. The mechanism in RFC 4684 is known as Route Target Constraint (RTC). 


An operator would not normally configure RTC by default for any AFI/SAFI combination and 
would only enable it after careful consideration. When using the mechanisms defined in this 
document, the operator should carefully consider the effects of filtering routes. In some cases, 
this may be desirable, and in others, it could limit the effectiveness of the procedures. 
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